Statistical Applications in Genetics and Molecular Biology
نویسندگان
چکیده
We propose a heuristic approach to the detection of evidence for recombination and gene conversion in multiple DNA sequence alignments. The proposed method consists of two stages. In the first stage, a sliding window is moved along the DNA sequence alignment, and phylogenetic trees are sampled from the conditional posterior distribution with MCMC. To reduce the noise intrinsic to inference from the limited amount of data available in the typically short sliding window, a clustering algorithm based on the Robinson-Foulds distance is applied to the trees thus sampled, and the posterior distribution over tree clusters is obtained for each window position. While changes in this posterior distribution are indicative of recombination or gene conversion events, it is difficult to decide when such a change is statistically significant. This problem is addressed in the second stage of the proposed algorithm, where the distributions obtained in the first stage are post-processed with a Bayesian hidden Markov model (HMM). The emission states of the HMM are associated with posterior distributions over phylogenetic tree topology clusters. The hidden states of the HMM indicate putative recombinant segments. Inference is done in a Bayesian sense, sampling parameters from the posterior distribution with MCMC. Of particular interest is the determination of the number of hidden states as an indication of the number of putative recombinant regions. To this end, we apply reversible jump MCMC, and sample the number of hidden states from the respective posterior distribution.
منابع مشابه
Strategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملSLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics
"SLC2A4 Polymorphisms Can Be a New Molecular Biomarker for Sports Genomics" is an "Editorial Article" and hasn't abstract.
متن کاملStatistical Applications in Genetics and Molecular Biology
This note is a comment on the article “Dimension Reduction for Classification with Gene Expression Microarray Data” that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et al., 2006).
متن کاملExpression Analysis of PKS13, FG08079.1 and PKS10 Genes in Fusarium graminearum and Fusarium culmorum
Background: Identification and quantification of mycotoxins produced by Fusarium species are important in controlling fungal diseases. Objectives: Potential of zearalenone, butenolide and fusarin C production was investigated in five Fusarium graminearum and five F. culmorum isolates at molecular level. Materials and Methods: Presence of PKS13, FG08079.1 and PKS10 genes, associated with produ...
متن کاملMolecular Epidemiology of Breast Cancer among Iranian-Azeri Population based on P53 Research
Background: This study was done in order to enhance our understanding about molecular and epidemiological features of breast cancer among the Azeri population with special emphasis on the detection of TP53 mutations. We also analyzed the role of the P53codon72 polymorphism (rs1042522) and its role in susceptibility to breast cancer. Methods: ...
متن کامل